71 research outputs found

    From Adversarial Arms Race to Model-centric Evaluation: Motivating a Unified Automatic Robustness Evaluation Framework

    Full text link
    Textual adversarial attacks can discover models' weaknesses by adding semantic-preserved but misleading perturbations to the inputs. The long-lasting adversarial attack-and-defense arms race in Natural Language Processing (NLP) is algorithm-centric, providing valuable techniques for automatic robustness evaluation. However, the existing practice of robustness evaluation may exhibit issues of incomprehensive evaluation, impractical evaluation protocol, and invalid adversarial samples. In this paper, we aim to set up a unified automatic robustness evaluation framework, shifting towards model-centric evaluation to further exploit the advantages of adversarial attacks. To address the above challenges, we first determine robustness evaluation dimensions based on model capabilities and specify the reasonable algorithm to generate adversarial samples for each dimension. Then we establish the evaluation protocol, including evaluation settings and metrics, under realistic demands. Finally, we use the perturbation degree of adversarial samples to control the sample validity. We implement a toolkit RobTest that realizes our automatic robustness evaluation framework. In our experiments, we conduct a robustness evaluation of RoBERTa models to demonstrate the effectiveness of our evaluation framework, and further show the rationality of each component in the framework. The code will be made public at \url{https://github.com/thunlp/RobTest}.Comment: Accepted to Findings of ACL 202

    OWL: A Large Language Model for IT Operations

    Full text link
    With the rapid development of IT operations, it has become increasingly crucial to efficiently manage and analyze large volumes of data for practical applications. The techniques of Natural Language Processing (NLP) have shown remarkable capabilities for various tasks, including named entity recognition, machine translation and dialogue systems. Recently, Large Language Models (LLMs) have achieved significant improvements across various NLP downstream tasks. However, there is a lack of specialized LLMs for IT operations. In this paper, we introduce the OWL, a large language model trained on our collected OWL-Instruct dataset with a wide range of IT-related information, where the mixture-of-adapter strategy is proposed to improve the parameter-efficient tuning across different domains or tasks. Furthermore, we evaluate the performance of our OWL on the OWL-Bench established by us and open IT-related benchmarks. OWL demonstrates superior performance results on IT tasks, which outperforms existing models by significant margins. Moreover, we hope that the findings of our work will provide more insights to revolutionize the techniques of IT operations with specialized LLMs.Comment: 31 page

    Effects of Main Meteorological Indicators on Eating Quality of Rice in Lower Reaches of the Huai River

    No full text
    The main meteorological indicators affecting the eating quality of rice (Oryza sativa L.) in the lower reaches of Huai river were studied and the optimal sowing time range for obtaining good eating quality was put forward. Compared with solar radiation, rainfall, and humidity, temperature is the primary meteorological factor affecting the eating quality of rice in the lower reaches of the Huai river. Sowing the rice on different dates altered the heading and maturity dates of rice, and the difference between the mean daily temperature (Tmean) from the heading to maturity stage reached 4.6–5.0 °C. The Tmean from heading to maturity for all treatments was less than 23.5 °C. When the temperature was lower than 20.2 °C during the grain filling period, the value of the comprehensive evaluation of eating quality (CEQ) of the three types of rice decreased significantly. The medium-maturing japonica soft rice varieties (SMR), late-maturing japonica soft rice varieties (SLR), and late-maturing japonica non-soft rice varieties (LR) varieties that were subjected to low temperatures had a higher amylose content and protein content. Overall, the eating quality of rice in the lower reaches of the Huai river was affected by the low Tmean after the heading stage. The mean daily temperature (Tmean) range from the heading to maturity stages of SMR, SLR, and LR varieties that produced relatively high CEQ were 20.2–23.3 °C, 20.2–22.1 °C, and 20.3–22.1 °C, respectively. The optimal sowing date ranges of SMR, SLR, and LR were 16 May to 1 June, 16 to 18 May, and 16 to 20 May, respectively

    PET imaging agents targeting macrophage surface receptors

    No full text
    • …
    corecore